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Adaptive artificial vision metliod and system 



BACKGROUND OF THE INVENTION 

1. Field of the invention 

The present invention relates to an adaptive artificial vision method 
and to an adaptive artificial vision system. 

2. Technical background 

In most artificial vision systems, objects are either preglven or they 
do not exist. 

Some systems perform Image segmentation based on the particular 
characteristics of an Image (color, boundaries, etc.). These systems have no 
notion of objects. They just extract regions which seem interesting In 
themselves. They work well if the background on which the "objects" are 
presented is known or strongly constrained (e.g. colored objects on a white 
floor). In such cases the segments automatically extracted can be 
considered as the "contour" of some objects. 

Other systems perform object identification given a set of predefined 
objects that they use as models. If the models are of sufficiently good 
quality, performances of such systems can be very good. See for example 
the handbook from S. Ullman entitled "High-level vision: object recognition 
and visual cognition", MIT Press, Boston, MA, USA, 1996. 

Unfortunately in some situations, neither of these two conditions can 
be met. This is particularly true, in the case of robots evolving in natural 
unknown environment, trying to discover the "objects" present without 
knowing them in advance. In such cases, segmenting and recognizing 
objects become a bootstrapping problem that can be summarized in the 
following way: 

• Segmentation algorithms do not work well in real-life conditions if no 
template of the objects is provided. 

• Templates of the objects cannot be built without a good segmentation 
algorithm. 

This situation leads to a technological deadlock. 
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Summary of the invention 

The present invention alms at overcoming the above-mentioned 
drawbacks and at enabling efficient bootstrapping of artificial visual 
recognition even In an unknown environment where objects which are not 
5 known in advance are present. 

These aims are obtained by an adaptive artificial vision 
method comprising the following steps: 

(a) defining successive couples of synchronized timesteps (ti, t ; 
t, t+i ;,..) such that the time difference t between two synchronized 

10 timesteps (ti, t ; t, t+i ;...) of a couple of synchronized timesteps is equal to 
a predetermined time delay to, 

(b) comparing two successive images (Im, It ; It, It+i, --) at each 
couple of synchronized timesteps (ti, t ; t, t+i ;...) spaced by said 
predetermined time delay to for obtaining a delta image At which is the 

15 result of the computation of the distance between each pixel of said two 
successive images (It-i, It ; It, It+i, . ) in view of characterizing movements 
of objects between said two successive images (It-i, It ; It, It+i,- - )/ 

(c) extracting features from said delta image At for obtaining a 
potential dynamic patch Pt which is compared with dynamic patches 

20 previously recorded in a first repertory Rd which is progressively constructed 
in real time from an initial void repertory, 

(d) selecting the closest dynamic patch Dj in the first repertory Rd 
or if no sufficiently close dynamic patch still exists, adding the potential 
dynamic patch Pt to the first repertory Rd and therefore obtaining and 

25 storing a dynamic patch D| from the comparison of two successive images 
(It-i, It ; It, It+i/. .) at each couple of synchronized timesteps (ti, t ; t, 
t+i ;...), and 

(e) temporally integrating stored dynamic patches Dj of the first 
repertory Rd in order to detect and store stable sets of active dynamic 

30 patches representing a characterization of a reoccuring movement or event 
which Is observed. 

When stable sets of active dynamic patches representing a 
characterization of a reoccuring movement have been detected, the center 
of the movement Is Identified and static patches which are at a 
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predetermined distance d from the movement center and are obtained by a 
process of static pattern recognition are analyzed to constitute at a given 
timestep a set of active static patclies Si whicli are stored in a second 
repertory Rg. 

5 Stored static patches Si of the second repertory Rs are spatially 

integrated in order to detect and store stable sets of active static patches 
representing a characterization of an object which is recurrently involved in 
observed known reoccuring movements. 

According to a particular embodiment, the process of static pattern 

10 recognition and production of static patches is initiated after stable sets of 
active dynamic patches representing a characterization of a reoccuring 
movement have been detected. 

According to another particular embodiment, the process of static 
pattern recognition and production of static patches is initiated at the same 

15 time as the process of dynamic movement recognition and production of 
dynamic patches and when stable sets of active dynamic patches 
representing a characterization of a reocccuring movement have been 
detected, the process of static pattern recognition is continued exclusively 
with static patches which are located in a restricted area of the image which 

20 is centered on said identified movement center. 

According to a specific embodiment, during the computation of the 
distance between each pixel of two successive images (Im, It), a filter 
function fth is used to keep only the most significant differences and 
therefore obtain a delta image At such that 

25 At=fth( ll(lM,It) II). 

The filter function Fth may be a threshold function. 
According to a particular embodiment, the step of extracting features 
from the delta image At comprises computing a gaussian color model of the 
30 distribution for each color component- 
According to a preferred embodiment, static patches are obtained on 
the basis of salient points (x,y) in an image It provided at a synchronized 
timestep t when a salient point (x,y) is detected, a region Rx,y 
corresponding to the surrounding pixels is defined and features are 
35 extracted from this region Rx,y to define a potential static patch Sx,y. 



In such a case, the extraction of features from the region Rx,y may 
comprise measuring the color change of a pixel compared to Its neighbors 
and computing a color model of the color distribution in the region Rx,y. 

Successive couples of synchronized timesteps (t-i, t ; T+ti ; T+t ; 
5 ...) are separated by a period of time T which is equal to n times the 
predetermined time delay xo, where n is an integer which is positive or 
equal to zero. 

However preferably successive couples of synchronized timesteps 
(ti, t ; t, t+i ;...) are contiguous without any time interruption between two 
10 successive couples of synchronized timesteps (ti, t ; t, t+i). 

The method according to the invention may further comprise the 
step of detecting transitions between stable sets of active dynamic patches 
representing a characterization of reoccuring movements and of 
constructing transition graphs for predicting complex events comprising a 
15 sequence of identified movements. 

The invention further relates to an adaptive artificial vision system 
comprising: 

- a clock for defining successive couples of synchronized timesteps 
(ti, t ; t, t+i ;...) such that the time difference x between two synchronized 
timesteps (ti, t ; t, t+i ;...) of a couple of synchronized timesteps Is equal to 
a predetermined time delay xo, 

- Inputting means for inputting images (Im, It ; It, It+i,...) provided 
by a camera at said synchronized timesteps (ti, t ; t, t+i ;...), 

- first comparator means for comparing two successive Images 
(It-i/ It ; It/ It+u-) Inputted at each couple of synchronized timesteps (ti, t ; 
t, t+i ;...) spaced by said predetermined time delay xb for obtaining a delta 
image At which Is the result of the computation of the distance between 
each pixel of said two successive images (It-i, It ; It, It+i,...)/ 

-first memory means (i^d) for storing dynamic patches Dj 
representing elementary visual parts for describing characterized 
movements of objects, 

- feature extraction means for extracting features from said delta 
image At and producing a potential dynamic patch Pt, 

- second comparator means for comparing a potential dynamic 
patch Pt which is compared with dynamic patches previously recorded in 
said first memory means (Md), 
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- selection means for selecting the closest dynamic patch D| in the 
first memory means (Md) or if no sufficiently close dynamic patch still exists, 
for recording the potential dynamic patch Pt into the first memory means so 
that a dynamic patch Dj Is stored in the first memory means for each 

5 comparison of two successive images (Im, It ; It, It+i/-0 at each couple of 
synchronized timesteps (ti, t ; t, t+i ;...), 

- first temporal integration means comprising computing means 
for computing during a time Tfi corresponding to a predetermined number 
Nl of couples of synchronized timesteps the frequency of each dynamic 

10 patch Di stored in the first memory means and threshold means for defining 
a set of active dynamic patches comprising dynamic patches D| whose 
frequency is higher than a predetermined threshold, and, 

- second temporal Integration means comprising computing means 
for computing during a time Tf2 corresponding to a predetermined number 

15 N2 of couples of synchronized timesteps the frequency of each set of 
defined active dynamic patches and threshold means for defining a stable 
set of dynamic patches corresponding to a reoccuring movement for each 
set of active dynamic patches whose frequency is higher than a 
predetermined threshold. 

20 The adaptive artificial vision system further comprises means for 

identifying the center of a reoccuring movement represented by a stable set 
of active dynamic patches and means for triggering static pattern 
recognition for analyzing static patches which are at a predetermined 
distance d[ from said center of a reoccuring movement 

25 

BRIEF DESCRIPTION OF THE DRAWINGS 

Specific embodiments of the present invention will now be described 
by way of example only, with reference to the accompanying drawings, in 
which: 

30 - Figure 1 is a picture illustrating an example of a reoccuring event 

which may be observed and detected by a system according to the present 
invention, 

- Figure 2 shows those sets of pictures illustrating the construction 
process of three different dynamic patches contributing to recognizing the 
35 event shown on Figure 1, 
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- Figure 3 is a picture given as an example for illustrating the 
construction of a set of active static patches during the detection of an 
event previously recognized by a dynamic patch set, 

- Figure 4 shows examples of pictures of objects corresponding to 
5 static patch sets constructed during the process of event and object 

recognition, 

- Figure 5 shows a picture illustrating a reference movement 
identified by two stable patch sets and three further pictures illustrating 
further movements identified by three patch sets which are linked to the 

10 reference movement, 

- Figure 6 shows the pictures illustrating different movements which 
may give rise to transitions from one movement to the other and enable the 
construction of a transition graph, and 

- Figure 7 is a diagram schematically illustrating an example of 
15 architecture of an adaptive artificial vision system according to the 

Invention. 

DETAILED DESCRIPTION OF THE PREFERED E|V|BQDIMENTS 

The adaptive artificial vision system according to the invention is 

20 conceived in such a manner that it is capable of bootstrapping object and 
event recognition. 

The method and system according to the invention start with very 
crude recognition capabilities. The system may be called impressionist since 
it Is adapted to perceive patches which are elementary visual parts for 

25 describing dynamic or static events. A dynamic event is usually defined by a 
movement whereas a static event Is defined by an object. 

The patches are constructed by the system to describe its visual 
perception at the lowest level. As the system accumulates visual 
experiences, it attempts to integrate patches together in order to discover 

30 stable sets. This integration happens both in the time and the spatial 
domain. After a while, the system becomes capable of recognizing 
reoccuring movements. After reaching this stage, it tries to extract the 
structure of objects involved in the movements it detects. By these means, 
it begins to be able to recognize these objects even when they are not 

35 moving. Stage after stage, starting from scratch, this artificial system learns 
to structure Its perception into more and more complex representations. 
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Stable patch sets constitute a higher level representation of things 
happening and present in the environment (events and objects). 
Furthermore the way dynamic patch sets are constructed directly influences 
the way static patch sets are processed. In practice, good dynamic 
5 recognition comes first and helps static recognition to get off the ground. 

The process of dynamic recognition will now be described in a more 
detailed manner with reference to Figures 1, 2 and 7. 

A clock 101 defines successive couples of synchronized timesteps 
such as ti, t- The time difference x between two synchronized timesteps, 
10 e.g. ti, t of a couple of synchronized timesteps, is equal to a predetermined 
time delay tq. 

Successive couples of synchronized timesteps, e.g. ti, t ; T+ti, 
T+t are separated by a period of time T which is equal to n times the 
predetermined time delay xo, where n Is an integer which Is positive or 
15 equal to zero. 

Preferably n = 0 and successive couples of synchronized timesteps 
t-i, t ; t+i are contiguous without any time interruption between two 
successive couples of synchronized timesteps. 

Images of the environment are obtained by a camera 103 such as a 
20 video camera, and inputted to the system via an inputting circuit 102 
synchronized by the clock 101. Digital images Im, It, It+i, -- of scenes 
viewed by the camera 103 are therefore inputted by the inputting circuit 
102 to a first comparator 104 at defined successive timesteps ti, t ; t+i 

In the following description it will be assumed that successive 
25 couples of synchronized timesteps ti, t ; t+i are contiguous (i.e. n = 0 
and T = nxo = 0), 

A dynamic patch is obtained by comparing two successive images 
Im, It ; It/ It+i inputted to the first comparator 104 at each couple of 
synchronized timesteps ti, t ; t, t+i spaced by the predetermined time 
30 delay xo- 

A delta image At constitutes the result of the computation of the 
distance between each pixel of two successive images It-i, It ; It, It+i, in 
view of characterizing events (movements of objects) which occurred 
between the two successive images It-i, It ; It, It+i/ - 
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A filter function such as a thresliold function fm may be included in 
the first comparator 104 in order to keep only the most significant 
differences. 

In such a case. At = ftn ( 1 1 It - It-i 1 1) when a couple of images It-a, It 
is considered. 

A memory Md is used for storing dynamic patches Dj representing 
elementary visual parts for describing characterized movements of objects. 

The delta image At is applied to a module 105 which extracts 
features from the delta image At and produces a potential dynamic patch Pt 

The module 105 for extracting features of the delta image At may 
use different known techniques. 

F=or example a gaussian color model of the distribution for each color 
component may be computed by the module 105. 

Alternatively, the step of extracting features from the delta image At 
in the module 105 may comprise using histograms to model the distribution 
for color components, shape or texture. 

In the case of a color model, a potential dynamic patch Pt represents 
a color distribution of the delta image At. 

The memory l^d comprises a repertory Rd which is progressively 
constructed in real time from an initial void repertory. 

In a module 106, a potential dynamic patch Pt is compared with 
dynamic patches previously recorded in the repertory Rd of the memory Md. 

A selection module 107 permits to select the closest dynamic patch 
Dj In the memory Md or, if this is an initial step or if no sufficiently close 
dynamic patch still exists, the selection module 107 permits to record the 
potential dynamic patch Pt in the repertory Rd within the memory Md- 

A dynamic patch Dj is therefore stored in the memory Md for each 
comparison of two successive images at each couple of synchronized 
timesteps. 

On Figure 2, references 20A and 20B illustrate an example of two 
successive images showing a first movement of a hand with a ball of a 
certain color (e.g. red). Reference 20C shows the delta image At 
corresponding to the successive images 20A, 20B. A dynamic patch D20 is 
constructed from delta image 20C and is stored in the dynamic patch 
repertory Rd. 



Similarly, references 22A and 22B illustrate an exannple of two 
successive images showing a second movement of a liand with a ball of the 
same red color and reference 22C shows the delta image corresponding to 
the Images 22A, 22B and enabling the construction of a dynamic patch D22. 
5 References 26A, 26B illustrate an example of another couple of two 

successive images showing a third movement of a hand with a ball of 
another color, e.g. a yellow ball, and reference 26C shows the delta image 
corresponding to the images 26A, 268 and enabling the construction of a 
dynamic patch D26- 

10 Dynamic patches D20/ D22/ D26 are all stored in the repertory Rd of 

the memory Md. 

The module 108 of Figure 7 constitutes first temporal Integration 
means and comprises computing means 108A for computing during a time 
Tfi corresponding to a predetermined number Nl of couples of 

15 synchronized timesteps the frequency of each dynamic patch Di stored In 
the register Rd, Threshold means 108B permit to define a set of active 
dynamic patches with the dynamic patches Dj whose frequency Js higher 
than a predetermined threshold. 

Then the frequency of each patch Di of the repertory Rd during the 

20 last Nl couples of timesteps is computed. All the patches Di which 
frequency is above a given threshold are said to be active. At a given 
timestep, there is a set of active patches which are supposed to 
characterize the event which Is experienced by the system. 

For example, at the timestep t, with picture 1 of Figure 1, three 

25 patches may be considered to be active: At = D20, D22, D26- The picture 1 of 
Figure 1 shows a moving hand with a yellow ball and may be deemed to be 
defined by the dynamic patches D20/ D22/ D26 corresponding to the delta 
Images 20C (first movement of the hand with a red ball), 22C (second 
movement of the hand with a red ball) and 26C (third movement of the 

30 hand with a yellow ball). 

The system shown on Figure 7 further comprises a module 109 
constituting second temporal integration means. 

The module 109 comprises computing means 109A for computing 
during a time Tf2 corresponding to a predetermined number N2 of couples 

35 of synchronized timesteps the frequency of each set At of defined active 
dynamic patches such as D20/ D22/ D26- 
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Threshold means 109B permit to define a stable set of dynamic 
patches corresponding to a reoccuring movement for each set At of active 
dynamic patches whose frequency is higher than a predetermined 
threshold. 

Thus the frequency of each active patch is computed and If some 
active patch set reoccurs sufficiently often, it is considered to be stable. It is 
in that case assumed that it corresponds to something interesting 
happening regularly in the environment. An event of higher level than the 
patches is therefore created. Such a dynamic event, which for example in 
Figure 1 may be defined as a "showing yellow ball" event, is created and 
defined by the set of patches Involved In the reoccuring movement. In the 
example of Figure 1, the event is defined by the set of patches D20/ D22, D26 
and these data are stored In memory Md. 

As soon as such a dynamic event Is constructed, a special detector is 
created triggering each time an active patch set corresponds to this event. 
Figure 2 shows an example of such a detection. The user Is showing a 
yellow ball to the camera. The system characterizes this event with the 
active patch set At = (D20, D22, D26). The system recognizes this set as a 
dynamic event, which has already appeared several times in the past. To 
have a better insight of how the recognition worlcs. Figure 2 shows for each 
of the patches involved in the detection the two images that were at the 
origins of the initial patch creation. D20 and D22 were created when a red 
ball was shown to the camera and D26 for a similar event involving this time 
a yellow ball. The two first are characteristics of the special movement of 
the hand performed when showing a round-shaped object, the last one is 
more specific about the yellow color Involved. The description in patches 
can be seen as a very simple "language" that the system builds to 
economically describe what Is happening. Every time a new event Is 
perceived, the system recruits, as much as It can, existing patches in its 
repertories to account for it. 

Thus, if it is assumed that an interesting object can be moved, by 
opposition with features which are only part of the background, the results 
obtained by the dynamic recognition process may be used to bootstrap or 
refocus a process of static recognition. 

When a dynamic event constituted by a movement has been reliably 
detected as being recurrent, the system immediately focuses Its attention 
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on the movement center identified In the module 110 of Figure 7 and static 
pattern recognition achieved by a module 112 is triggered by triggering 
means 111 so that static patches which are at a predetermined distance d 
from the center of a reoccuring movement are analyzed to constitute at a 
given timestep a set of active patches. 

On Figure 3 a known event corresponding to a hand movement has 
been detected (picture 3). In that case, four static patches are considered 
to be part of it (S16, S29, S34, S40). 

The system may compute the frequency of each active static patch 
set in exactly the same manner as for the dynamic case. If some set is 
regularly Involved in known movements, it is assumed that they correspond 
to an existing object. A static object is then created. It is defined by the set 
of static patches regularly perceived. 

As soon as the first static object Is constructed, the system looks at 
each timestep for it. It searches the patches detected during the image 
analysis, looking for the configuration corresponding to the definition of the 
object. The object, which was first constructed because it was part of a 
reoccuring movement, can now be detected even if it is not moving. In the 
example of Figure 1, the system detects both a known movement (hand 
showing movement) and a known object (the ball). 

Figure 4 shows examples of objects corresponding to static patch 
sets which may be constructed by the system from pictures including the 
picture 30 of Figure 3. Objects 11 to 14 represent respectively in this 
example a head, a neck, a ball and a dog shaped robot. 

A specific example of static recognition system will be described 
herebelow: 

Static patches are based on salient points in the visual perception It. 
For example, in a human face salient points may be the eyes and the 
mouth. 

These salient points can be determined by, for instance, measuring 
the color change of a pixel compared to its neighbors. This is typically the 
method that is used for edge detection. 

When a salient point (x,y) is detected, the region Rx,y corresponding 
to the surrounding pixels is defined. 

Rx,y = regionaround(x,y) 
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As in the dynamic case, different features may be extracted but 
according to a specific embodiment, a color model of the pixel of this region 
is computed. This will constitute a potential static path Sx,y 

Sx,y = colordistribution(Rx,y) 

This patch is compared to the existing patches in the memory Md. 
The closest patch Si in the memory is selected. If no sufficiently close patch 
exists, Sx,y is added to the memory l^d. 

Si = closestpatch(Sx,y, i^s) 

At each timestep t, the system analyzes the image It and produces a 
set of static patches Si for describing it. The larger the number of static 
patches, the better will be the recognition. But as this process goes on at 
15 every timestep, one must be careful of not producing an analysis which is 
too time consuming. Therefore, for performance sake, it can be wise to 
limit the number of possible static patches. This can be obtained in different 
ways. 

1. By doing the analysis at a lower resolution. 

2. By introducing a focus mechanism that only concentrates on a 
given part of the Image for each timestep. 

3. By specifying a minimal distance between each salient points. 

4. By prefiltering the image, extracting parts that are more likely to 
contain interesting things (saturated or skin-toned color). 

25 5. By limiting the salient points to a fixed number and only take the 

most salient ones. 

An example of an already recognized event El defined by a stable 
set of dynamic patches is illustrated on the picture 50 of Figure 5. This 
30 example of event El is constituted by a movement of a hand defined by 
two dynamic patches Di and D4. 

Pictures 51, 52, 53 of Figure 5 illustrate other events E2, E3, E, which 
are defined by other stable sets of dynamic patches, namely D1-D4-D9, Di- 
D4-D5 and D1-D4-D11 respectively. 

Pictures 51, 52, 53 show further movements of a hand with an 
object constituted by a ball which is held in the hand. Pictures 51 and 53 
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show very similar movements of hand with balls of different colors. Pictures 
51 and 52 show slightly different movements of hand with balls of the same 
color. 

The definitions of events E2, E3, E4 all include dynamic patches Di 
5 and D4 which define event Ei, and differ from the definition of event Ei only 
by the addition of a third dynamic patch (respectively D9, D5, Dn). Thus, 
similarities may be identified between different events Ei, E2, E3, E4 starting 
from their structure defined by a stable set of dynamic patches. Events E2, 
E3, E4 may thus be recognized as belonging to the same family of similar 
10 events since they are all structurally related to another autonomous event 
El. 

Figure 6 illustrates another way of structuring the event space. 

If an event Ei defined by a stable set of dynamic patches is 
recognized at a time step ti, the system according to the present Invention 
15 may observe the new situation created at a further timestep t2 where an 
event Ej defined by a stable set of dynamic patches is recognlzed- 

If the time difference between t2 and ti is smaller than a 
predetermined time interval At (At being itself far greater than the 
predetermined time delay xo between synchronized timesteps ti, t, t+i,.., 
20 and also greater than the integration time of a stable set of dynamic 
patches), then a transition is identified between events Ei and Ej and a 
transition arc is created. Progressively, a transition graph is constructed 
with transition arcs between events Ei, Ej,... defining movement types 
Identified by stable sets of dynamic patches. 
25 It may be noted that the same movement may be continued during a 

certain amount of time and a transition arc may exist between the similar 
event Ei observed at different timesteps ti, t2- Transition arcs may as well 
be constructed between different events E|, Ej observed at successive 
timesteps ti, t2. 

30 After a certain amount of time where different events have been 

observed several times, the frequencies of the observed transitions may be 
used to predict the probability that each Identified transition occurs when 
the same situation is detected. 

Figure 6 shows three pictures 61, 62, 63 Illustrating three different 

35 events E5, Ee, E7 respectively. 
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In the example of Figure 6, picture 61 shows a hand holding an 
object from the top and imparting a movement to and fro, this movement 
constituting the event Eg defined by a stable set of dynamic patches D4, D2. 

Picture 62 shows a hand holding an object from the bottom' and 
imparting a shaking movement, this movement constituting the event Eg 
defined by a stable set of dynamic patches D3, D2. 

Picture 63 shows a hand holding an object from the top and 
imparting a spinning movement, this movement constituting the event E7 
defined by a stable set of dynamic patches D4, D3, D2. 

Transition arcs are represented on Figure 6 with the probabilities of 
each new transition computed from the already observed transitions and 
the frequencies thereof. 

In this example, the probability that an event E5 follows after itself is 
70% whereas the probability that an event E7 follows after an event E5 is 
30% and the probability that an event Eg follows after an event E5 is IQo/o 
The probability that an event Ee follows after itself is 85% whereas the 
probability that an event E5 follows after an event Eg is 10% and the 
probability that an event E7 follows after an event Ee is 5%. Finally the 
probability that an event E7 follows after itself is 65% whereas the 
probability that an event E5 follows after an event E7 is 300/0 and the 
probability that an event Ee follows after an event E7 is 5%. 

The system according to the invention does not need to store huge 
amounts of data since patches or sets of patches are stored in memories 
Md, Ms. 

Moreover in case of need, some prototype patches D| which are not 
used in stable sets of patches may be eliminated at regular intervals to 
reduce the amount of data stored. 

It has been checked that the system according to the invention may 
be implemented in embedded systems without too much processing power 
For example, the system may be incorporated in a robot equipped 
with a 492x362 pixels color CCD camera and connected to a separate 
computer through a wireless LAN. Such a computer may be for example a 
SONY VaioNote PCG CIXN running Linux. 

The robot is evolving in unknown environments such as a house In 
such conditions, traditional segmentation methods give very poor results 
whereas the invention provides a significant improvement. In particular i^ 
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the system Isvused in an application in which the robot tries to learn the 
name of objects, a good mechanism for sharing attention about a particular 
object is a crucial component of the system. In a robot equipped with a 
system according to the invention, the Impressionist vision module is 
coupled with an attention manager that controls the head of the robot. The 
robot may thus turn its head towards things that move and things it knows. 
As the bootstrapping process goes on, the robot displays its progress in 
recognition by showing a sharper attention behavior. 
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CLAIMS 

S 1. An adaptive artificial vision method comprising tlie following 

steps: 

(a) defining successive couples of synchronized timesteps (ti, t ; 
t, t+i ;...) such that the time difference x between two synchronized 
timesteps (ti, t ; t, t+i ;...) of a couple of synchronized timesteps is equal to 

10 a predetermined time delay to, 

(b) comparing two successive images (Im, It ; It, It+i, ■ ) at each 
couple of synchronized timesteps (ti, t ; t, t+i ;...) spaced by said 
predetermined time delay to for obtaining a delta image At which is the 
result of the computation of the distance between each pixel of said two 

15 successive images (Im, It ; It, It+i, -) in view of characterizing movements 
of objects between said two successive images (Im, It ; It, It+i,— )/ 

(c) extracting features from said delta image At for obtaining a 
potential dynamic patch Pt which is compared with dynamic patches 
previously recorded in a first repertory Rd which is progressively constructed 

20 in real time from an initial void repertory, 

(d) selecting the closest dynamic patch Di in the first repertory Rd 
or if no sufficiently close dynamic patch still exists, adding the potential 
dynamic patch Pt to the first repertory Rd and therefore obtaining and 
storing a dynamic patch Di from the comparison of two successive images 

25 (It-i, It ; It, It+i,-) at each couple of synchronized timesteps (ti, t ; t, 
t+i ;...), and 

(e) temporally Integrating stored dynamic patches Di of the first 
repertory Rd in order to detect and store stable sets of active dynamic 
patches representing a characterization of a reoccuring movement or event 

30 which is observed. 

2. A method according to claim 1, wherein when stable sets of 
active dynamic patches representing a characterization of a reoccuring 
movement have been detected, the center of the movement is identified 
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and static patches which are at a predetermined distance d from the 
movement center and are obtained by a process of static pattern 
recognition are analyzed to constitute at a given timestep a set of active 
^ static patches S, which are stored in a second repertory f^. 

3. A method according to claim 1, wherein stored static patches S 
of the second repertory Rs are spatially integrated In order to detect and 
store stable sets of active static patches representing a characterization of 
an object which is recurrently involved in observed known reoccuring 

10 movements. 

4. A method according to claim 2 or claim 3, wherein the process 
of static pattern recognition and production of static patches is initiated 
after stable sets of active dynamic patches representing a characterization 

15 Of a reoccuring movement have been detected. 

5. A method according to claim 2 or claim 3, wherein the process 
of static pattern recognition and production of static patches is initiated at 
the same time as the process of dynamic movement recognition and 

20 production of dynamic patches and when stable sets of active dynamic 
patches representing a characterization of a reocccuring movement have 
been detected, the process of static pattern recognition Is continued 
exclusively with static patches which are located in a restricted area of the 
image which is centered on said identified movement center 

25 

6. A method according to anyone of claims 1 to 5, wherein during 
the computation of the distance between each pixel of two successive 
in^ges \\ a filter function f^, is used to keep only the most significant 
differences and therefore obtain a delta image At such that 

^° At=fth( lldt-i^It) II) 

7. A method according to claim 6, wherein the filter function f^K is 
a threshold function. 
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8. A method according to anyone of claims 1 to 1, wherein the 
step of extracting features from the delta image At comprises computing a 
gausslan color model of the distribution for each color component. 

9. A method according to anyone of claims 1 to 1, wherein the 
step of extracting features from the delta image At comprises using 
histograms to model the distribution for color components, shape or 
texture. 

10. A method according to anyone of claims 2 to 5, wherein static 
patches are obtained on the basis of salient points (x,y) in an image It 
provided at a synchronized timestep t when a salient point (x,y) is detected, 
a region Rx,y corresponding to the surrounding pixels is defined and 
features are extracted from this region Rx,y to define a potential static patch 

Sx,y. 

11. A method according to claim 10, wherein the extraction of 
features from the region Rx,y comprises measuring the color change of a 
pixel compared to its neighbors and computing a color model of the color 
distribution in the region Rx,y. 

12. A method according to anyone of claims 1 to 11, wherein 
successive steps of synchronized timesteps (ti, t ; T+ti ; T+t ; ...) are 
separated by a period of time T which is equal to n times the 
predetermined time delay xo, where n is an integer which is positive or 
equal to zero. 

13. A method according to claim 12, wherein successive couples 
of synchronized timesteps (Li, t ; t, ti-i ;...) are contiguous without any time 
interruption between two successive couples of synchronized timesteps (ti, 
t;t,t^.l). 

14. A method according to anyone of claims 1 to 13, wherein it 
further comprises the step of detecting transitions between stable sets of 
active dynamic patches representing a characterization of reoccuring 
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movements and of constructing transition graphs for predicting complex 
events comprising a sequence of Identified movements. 

15. An adaptive artificial vision system comprising: 

fim.<*.n.'l ^^^"'"9 successive couples of synchronized 

timesteps (t,, t; t, t.. ;...) such that the time difference r between two 
synchronized timesteps (t, t; , ,..) of a couple of synchronl^ 
timesteps is equal to a predetermined time delay to, 

- inputting means (102) for inputting images (It, L ■ L l , ^ 
provided by a came. (103) at said synchronized «me"iepV ^Xt 



10 

t+i ;...), 



imaoes ..'"^.""P^'"^ "'^"^ ("4) comparing t«o successive 

nmesteps (t,, t , t, u, ;...) spaced by said predetermined time delay -z. for 
15 oto,n,„9 a delta image A, which is the r^lt of the computation of 
dBtance between each pixel of said two successive images i, ; n 

rPHr^c^nK^"^ "'^^"^ ^^'^ ^^""9 "^y^^^i^ patches Di 

representing elementary visual parts for describing characterized 
20 movements Of Objects, ^ w'actenzea 

- feature extraction means (105) for extracting features from said 
delta image At and producing a potential dynamic patch Pt, 

-second comparator means (106) for comparing a potential 
dynamic patch Pt which is compared with dynamic patches previously 
15 recorded In said fir^t memory means (M^), Previously 

in .ho f ""^^"^ ^^^^^ '^'^^'"9 ^'°s«st dynamic patch 

lxl2 for T"' ^""'^ °' "° '"""^'^"^'^ ^^"^"^'^ P^tch still 

mP^; T^T ^^"''"'■^ P^^^*^ ''^ the first memon. 

means so that a dynamic patch is stored in the first memory means for 
0 each comparison of two successive images (It.,, it ; i^ l,, ) at each 
couple of synchronized timesteps (t,, t ; t, t., ■.. )] 

means riofiT. ^'^"^ ^°'^P"^'"9 computing 

means (108A) for computing during a time T„ corresponding to a 
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predetermined number Nl of couples of synchronized timesiteps the 
frequency of each dynamic patch Di stored in the first memory means and 
threshold means (108B) for defining a set of active dynamic patches 
comprising dynamic patches Di whose frequency is higher than a 
S predetermined threshold, and, 

- second temporal integration means (107) comprising computing 
means (109A) for computing during a time Tf2 corresponding to a 
predetermined number N2 of couples of synchronized timesteps the 
frequency of each set of defined active dynamic patches and threshold 
10 means (109B) for defining a stable set of dynamic patches corresponding to 
a reoccuring movement for each set of active dynamic patches whose 
frequency Is higher than a predetermined threshold. 



16. A system according to claim 15, wherein It further comprises 
15 means (110) for identifying the center of a reoccuring movement 
represented by a stable set of active dynamic patches and means (111) for 
triggering static pattern recognition (112) for analyzing static patches which 
are at a predetermined distance d from said center of a reoccuring 
movement. 
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ABSTRACT 



The adaptive artificial vision method comprises the following 
steps: (a) defining successive couples of timesteps (ti, t ; t, t+i ;...) 
synchronized by a clock (101), (b) comparing two successive images (It-i, 
It ; It/ It+i/- .) from an input device (102, 103) at each couple of 
synchronized timesteps (ti, t',t,Ui ; ■ ) spaced by a predetermined time 
delay xo for obtaining a delta image At which is the result of the 
computation of the distance between each pixel of the two successive 
images (Im, It ; It, It+i/-..) in view of characterizing movements of objects, 
(c) extracting features from the delta image At for obtaining a potential 
dynamic patch Pt which is compared with dynamic patches previously 
recorded In a repertory which is progressively constructed in real time from 
an initial void repertory, (d) selecting the closest dynamic patch Di in the 
repertory or if no sufficientiy close dynamic patch still exists, adding the 
potential dynamic patch Pt to the repertory and therefore obtaining and 
storing a dynamic patch D| from the comparison of two successive images 
(It-i, It ; It, It+1, -0 at each couple of synchronized timesteps (ti, t ; t, 
t+i ;...)/ and (e) temporally integrating stored dynamic patches Di of the 
repertory in order to detect and store stable sets of active dynamic patches 
representing a characterization of a reoccuring movement or event which is 
observed. A process of static pattern recognition may then be efficientiy 
used. 



(Figure 7) 
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